Clusterization by the K-means method when K is unknown
نویسندگان
چکیده
منابع مشابه
On Lloyd’s k-means Method∗
We present polynomial upper and lower bounds on the number of iterations performed by Lloyd’s method for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two center...
متن کاملThe Complexity of the k-means Method
The k-means method is a widely used technique for clustering points in Euclidean space. While it is extremely fast in practice, its worst-case running time is exponential in the number of data points. We prove that the k-means method can implicitly solve PSPACE-complete problems, providing a complexity-theoretic explanation for its worst-case running time. Our result parallels recent work on th...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملQuantization and the method of k -means
T HE THEORY developed in the statistical literature for the method of k-means can be applied to the study of optimal k-level vector quantizers. In this paper, I describe some of this theory, including a consistency theorem (Section II) and a central lim it theorem (Section IV) for k-means cluster centers. These results help to explain the behavior of optimal vector quantizers constructed from l...
متن کاملLearning the k in k-means
When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. In this paper we present an improved algorithm for learning k while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ITM Web of Conferences
سال: 2019
ISSN: 2271-2097
DOI: 10.1051/itmconf/20192401013